2 research outputs found

    Noise-robust text-dependent speaker identification using cochlear models

    Get PDF
    One challenging issue in speaker identification (SID) is to achieve noise-robust performance. Humans can accurately identify speakers, even in noisy environments. We can leverage our knowledge of the function and anatomy of the human auditory pathway to design SID systems that achieve better noise-robust performance than conventional approaches. We propose a text-dependent SID system based on a real-time cochlear model called cascade of asymmetric resonators with fast-acting compression (CARFAC). We investigate the SID performance of CARFAC on signals corrupted by noise of various types and levels. We compare its performance with conventional auditory feature generators including mel-frequency cepstrum coefficients, frequency domain linear predictions, as well as another biologically inspired model called the auditory nerve model. We show that CARFAC outperforms other approaches when signals are corrupted by noise. Our results are consistent across datasets, types and levels of noise, different speaking speeds, and back-end classifiers. We show that the noise-robust SID performance of CARFAC is largely due to its nonlinear processing of auditory input signals. Presumably, the human auditory system achieves noise-robust performance via inherent nonlinearities as well

    Investigation of auditory nerve model and conventional approaches in noise-robust speaker identification

    No full text
    Automatic Speaker Identification (SID) is growing for the current demands of human-machine interaction in different fields, such as selfless driving vehicles, access to smartphones and laptops, and online security. These services become challenging while background noise is present. To achieve a noise-robust performance in adverse conditions, we propose two front-end feature extraction algorithms using the Auditory Nerve (AN) model. One algorithm uses energies of the Inner Hair Cell (IHC) response from the AN model. Another uses the energies of the linear chirp filter from the AN model followed by the cubic root and Discrete Cosine Transform (DCT). We investigate which algorithm is better in the SID task. We also use a modified Gammatone Filter Cepstral Coefficient (GFCC) as a reference. We tested these algorithms using text-dependent and text-independent speeches under clean and noisy conditions. This work shows that the performance of the proposed algorithms is way better than the previously proposed algorithm using the AN model. The algorithms with conventional nonlinearities significantly outperform the IHC algorithm in the noise-robust SID task. However, the application of conventional nonlinearities on the IHC algorithm provides a significantly improved SID performance
    corecore